Graph integration of structured, semistructured and unstructured data for data journalism

نویسندگان

چکیده

Digital data is a gold mine for modern journalism. However, datasets which interest journalists are extremely heterogeneous, ranging from highly structured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able make sense heterogeneous corpora, even if they lack the ability define deploy custom extract-transform-load workflows, especially dynamically varying sets sources. We describe complete approach integrating dynamic along lines described above: challenges we faced useful, allow their integration scale, solutions proposed these problems. Our implemented within ConnectionLens system; validate it through set experiments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Unified Approach to Structured, Semistructured and Unstructured Data

At the present time, the way in which we manage data depends on its structural features. In this report we propose a logical model and algebra which represents a step further in the process of bridging the gap between different data modeling approaches. In particular, the focus is on structured and semistructured data. Our model is based on set theory, as in the relational context, and on data ...

متن کامل

Warehousing Structured and Unstructured Data for Data

More data, especially unstructured data, is available to users than ever. There is so much data available that it is diicult for users to make use of their data in its raw form. To handle the diversity of data types, we have designed and prototyped a multidatabase/warehouse system. The system has been especially designed to facilitate the interaction of structured and unstructured data. The sys...

متن کامل

Structured Queries for Semistructured Probabilistic Data

We present SPOQL, a structured query language for Semistructured Probabilistic Object (SPO) model [4]. The original querylanguage—SPAlgebra [4], has traditional limitations like terse functional notation and unfamiliarity to application programmers. SPOQLalleviates these problems by providing familiar SQL-like declarative syntax. We show that parsing SPOQL queries is a more involving task than ...

متن کامل

Ozone: Integrating Structured and Semistructured Data

Applications have an increasing need to manage semistructured data (such as data encoded in XML) along with conventional structured data. We extend the structured object database model ODMG and its query language OQL with the ability to handle semistructured data based on the OEM model and Lorel language, and we implement our extensions in a system called Ozone. In our approach, structured data...

متن کامل

Semantic Integration of Structured and Unstructured Data in Data Warehousing and Knowledge Management Systems

Nowadays, increasing information in enterprises demands new ways of searching and connecting the existing information systems. This chapter describes an approach for the integration of structured and unstructured data focusing on the application to Data Warehousing (DW) and Knowledge Management (KM). Semantic integration is used to improve the interoperability between two well-known and establi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Systems

سال: 2022

ISSN: ['0306-4379', '1873-6076']

DOI: https://doi.org/10.1016/j.is.2021.101846